Learning objective
Concept
Assumptions
Possible forms of discontinuity
Linear regression model
where Zi is the assignment variable; c is the cut-off
Estimation strategy
Different Kernel function
Bandwidth = 2
alc <- read.csv("Data/examplealcohol.csv")
alc$MLDA <- 1*(alc$month>=0) #assign ppl to 2 groupsSuppose we hypothesize that there could be intercept and slope change due to MLDA. Fit a model for regression discontinuity design (global approach): \(suicide = \alpha + \beta_1MLDA_i + \beta_2month_i + \beta_3MLDA_i \times month_i + \varepsilon_i\)
## Fit a global model
rdl <- lm(alc.suicide ~ MLDA + month + MLDA*month, data=alc)
summary(rdl)
# There was a 0.14 (95% CI 0.04 0.24) excess alcohol-related suicide per
# 1000 hospital episodes among youths just older than MLDA
## The is no significant change in slope and the interaction term can be dropped
rd1b <- lm(alc.suicide ~ MLDA + month, data=alc)
summary(rd1b)
# There was a 0.14 (95% CI 0.04 0.24) excess alcohol-related suicide per
# 1000 hospital episodes among youths just older than MLDA##
## Call:
## lm(formula = alc.suicide ~ MLDA + month + MLDA * month, data = alc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.287538 -0.066763 0.000769 0.062755 0.300138
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3228889 0.0366951 8.799 6.92e-13 ***
## MLDA 0.1427157 0.0505295 2.824 0.00619 **
## month -0.0007748 0.0017295 -0.448 0.65557
## MLDA:month -0.0011432 0.0023971 -0.477 0.63494
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1078 on 69 degrees of freedom
## Multiple R-squared: 0.1822, Adjusted R-squared: 0.1466
## F-statistic: 5.123 on 3 and 69 DF, p-value: 0.002942
##
##
## Call:
## lm(formula = alc.suicide ~ MLDA + month, data = alc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.280099 -0.071606 0.004261 0.064833 0.305791
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.311880 0.028366 10.995 < 2e-16 ***
## MLDA 0.143859 0.050193 2.866 0.00548 **
## month -0.001370 0.001191 -1.150 0.25395
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1072 on 70 degrees of freedom
## Multiple R-squared: 0.1795, Adjusted R-squared: 0.156
## F-statistic: 7.655 on 2 and 70 DF, p-value: 0.000985
Plot of data; Note the discontinuity in intercept
Non-linear/curvilinear relationship might explain the gap => have polynomial like 2nd or 3rd degree
I(): treat variable ‘as is’, to differentiate from
formula operators (used to stop operator characters (^,+,*, etc) from
being interpreted as they are in a formula). For example
plot(y~x^3) vs plot(y~I(x^3))poly(): compute orthogonal polynomials to avoid
multicollinearityrd1c <- lm(alc.suicide ~ MLDA + month + I(month^2) + I(month^3), data=alc)
#OR
rd1c <- lm(alc.suicide ~ MLDA + poly(month, degree=3), data=alc)
# The estimate changed slightly from 0.14 to 0.12 excess alcohol-related suicide per 1000 # hospital episodes among youths just older than MLDA (though less precise) (no longer
# significant)Plot of data
Local linear/polynomial regression
Package: np
Usually involve two steps: (1) computing the bandwidth; (2) fitting the
local polynomial regression model
npregbw(y ~ x, bws, ckertype, regtype, bandwidth.compute=T, data)
npreg(bws)
require(np)
data <- read.csv ("Data/digoxin.csv")
plot(data$age, data$bmi, type="p", xlab="Age", ylab="BMI", las=1)## Local linear regression
bmi.bw <- npregbw(bmi~age, bandwidth.compute=TRUE,ckertype="gaussian", regtype="ll", data=data)
bmi.lp <- npreg(bws = bmi.bw)
## Visualize regression
with(data, plot(age[order(age)], predict(bmi.lp)[order(age)], type='l', las=1,
xlab="Age", ylab="BMI", xlim=c(30,90), ylim=c(15,45)))
with(data, points(age, bmi, pch=19, col="red"))##
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 /
Multistart 1 of 1 |
Multistart 1 of 1 |
Bandwidth importance
Local linear/polynomial regression
Packages: rdrobust
rdrobust(y, x, c = 0, p = 1, h = NULL, kernel = "tri", bwselect = "mserd", all = FALSE)
library(rdrobust)
rd.l <- rdrobust(alc$alc.suicide, alc$month, all=T)
rd.l #not much useful info## Call: rdrobust
##
## Number of Obs. 73
## BW type mserd
## Kernel Triangular
## VCE method NN
##
## Number of Obs. 36 37
## Eff. Number of Obs. 9 10
## Order est. (p) 1 1
## Order bias (q) 2 2
## BW est. (h) 9.240 9.240
## BW bias (b) 14.336 14.336
## rho (h/b) 0.645 0.645
## Unique Obs. 36 37
summary(rd.l)## Sharp RD estimates using local polynomial regression.
##
## Number of Obs. 73
## BW type mserd
## Kernel Triangular
## VCE method NN
##
## Number of Obs. 36 37
## Eff. Number of Obs. 9 10
## Order est. (p) 1 1
## Order bias (q) 2 2
## BW est. (h) 9.240 9.240
## BW bias (b) 14.336 14.336
## rho (h/b) 0.645 0.645
## Unique Obs. 36 37
##
## =============================================================================
## Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
## =============================================================================
## Conventional 0.098 0.156 0.632 0.528 [-0.207 , 0.403]
## Bias-Corrected 0.078 0.156 0.501 0.616 [-0.227 , 0.383]
## Robust 0.078 0.194 0.402 0.687 [-0.302 , 0.458]
## =============================================================================
# There was an estimated 0.08 (95% CI -0.30 0.46) excess alcohol-related
# suicide per 1000 hospital episodes among youths just older than MLDA
# Bias correction is needed due to the bandwidth selection.
# Note the wide 95% CI
# Selected bandwidth was about h = 9 which significantly reduced the sample size## Set h widest and kenrel as uniform to use global approach
rd.g <- rdrobust(alc$alc.suicide, alc$month, h=100, kernel='uniform', all=T)
summary(rd.g)## Sharp RD estimates using local polynomial regression.
##
## Number of Obs. 73
## BW type Manual
## Kernel Uniform
## VCE method NN
##
## Number of Obs. 36 37
## Eff. Number of Obs. 36 37
## Order est. (p) 1 1
## Order bias (q) 2 2
## BW est. (h) 100.000 100.000
## BW bias (b) 100.000 100.000
## rho (h/b) 1.000 1.000
## Unique Obs. 36 37
##
## =============================================================================
## Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
## =============================================================================
## Conventional 0.143 0.063 2.248 0.025 [0.018 , 0.267]
## Bias-Corrected 0.127 0.063 1.997 0.046 [0.002 , 0.251]
## Robust 0.127 0.101 1.250 0.211 [-0.072 , 0.326]
## =============================================================================
# The estimated effect is 0.14 (95% CI = 0.02 0.27), the bias is caused by the bandwidth
# selection, therefore conventional approach could be used when the bandwidth is pre-defined.Known and universal principle for treatment assignment
Universal principle for treatment assignment. Make sure no manipulation of treatment status (no ‘bunching’)
Continuity in potential outcomes at the cut-off.
Visual inspection will inform the model choice
Correct specification of model - Sensitivity analysis by allowing curvilinear relation/local regression (allow for non-linear relationship by polynomials + global/local approach)
Some subjects may not receive the assigned treatment. So treatment was partially defined by the assignment variable. Treatment effect can be estimated by two-stage least square (2SLS) method:
The assignment variable as instrumental variable. Can be fitted using the “fuzzy” option in rdrobust.